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Abstract. We extend the known tables of pseudosquares and pseu- 
docubes, discuss the implications of these new data on the conjectured 
distribution of pseudosquares and pseudocubes, and present the details of 
the algorithm used to do this work. Our algorithm is based on the space- 
saving wheel data structure combined with doubly-focused enumeration, 
run in parallel on a cluster supercomputer. 



1 Introduction 



It is well-known that testing for primality can be done in polynomial time [113] . 
However, the fastest known deterministic algorithms are conjectured to be the 
pseudosquares prime test of Lukes, Patterson, and Williams [5], and its gener- 
alization, the pseudocube prime test of Berrizbeitia, Miiller, and Willimas 
both of which run in roughly cubic time, if a sufficiently large pseudosquare or 
pseudocube is available. In particular, the pseudosquares prime test is very useful 
in the context of finding all primes in an interval [5], where sieving can be used 
in place of trial division. This, then, motivates the search for larger and larger 
peudosquares and pseudocubes, and attempts to predict their distribution. See, 
for example. Wooding and Williams [12] and also |7I11I8I2|I0] . 

In this paper, we present extensions to the known tables of pseudosquares 
and pseudocubes in ^ We discuss the implications of this new data on the con- 
jectured distribution of pseudosquares and pseudocubes in 331 and give a minor 
refinement of the current conjectures. Then we describe our parallel algorithm, 
based on Bernstein's doubly- focused enumeration [2] , which is used in a way sim- 
ilar, but not identical to the work of Wooding and Williams [H], combined with 
the space-saving wheel data structure presented in jS] §4.1]. We then suggest 
ideas for future work in 95l 



* Supported by a grant from the Holcomb Awards Committe, and computing resources 
provided by the Frank Levinson Supercomputing Center at Butler University. 



2 Computational Results 



Let {x/y) denote the Legendre symbol [5]. For an odd prime p, let Lp,2, the 
pseudosquare for p, be the smallest positive integer such that 

1. ip,2 = 1 (mod 8), 

2. (Lp^2/q) — 1 for every odd prime q < p, and 

3. Lp^2 is not a perfect square. 

In other words, Lp^2 is a square modulo all primes up to p, but is not a square. 
We found the following new pseudosquares: 



p 


Lp,2 


367 
373 
379 


36553 34429 47705 74600 46489 
42350 25223 08059 75035 19329 

> 10^5 



The two pseudosquares listed were found in 2008 in a computation that went up 
to 5 X 1024, taking roughly 3 months wall time. The final computation leading 
to the lower bound of 10^^ ran for about 6 months, in two 3-month pieces, the 
second of which finished on January 1st, 2010. 

Wooding and Williams '12[ had found a lower bound of L^qj ^ > 120120 x 
2^^ w 2.216 X 10^4. (Note: a complete table of pseudosquares, current as of this 
writing, is available at http://cr.yp.to/focus.html care of Dan Bernstein). 

Similarly, for an odd prime p, let Lp^^, the pseudocube for p, be the smallest 
positive integer such that 

1. Lp,3 = ±1 (mod 9), 

2. I/p'g ^■'^'^ = 1 (mod q) for every prime q < p, q = 1 (mod 3), 

3. gcd(Lp_3,q') = 1 for every prime q < p, and 

4. Lp^3 is not a perfect cube. 

We found the following new pseudocubes (only listed for p = 1 (mod 3)): 



p 


Lp,3 


499 


601 25695 21674 16551 89317 


523,541 


1166 14853 91487 02789 15947 


547 


41391 50561 50994 78852 27899 


571,577 


1 62485 73199 87995 69143 39717 


601,607 


2 41913 74719 36148 42758 90677 


613 


67 44415 80981 24912 90374 06633 


619 


> 1027 



These pseudocubes were found in about 6 months of total wall time in 2009. 
Wooding and Williams [12] had found a lower bound of ^499, 3 > 1.45152 x 10^^. 
For a complete list of known pseudocubes, see |12|4|10j . 



3 The Distribution of Pseudosquares and Pseudocubes 



Let Pi denote the ith prime, and qi denote the ith prime such that qi = 1 (mod 3). 
In [51 it was conjectured that, for a constant C2 > 0, we have 

ip„,2 « C22"logp„. (1) 

Using similar methods, in it was conjectured that, for a constant C3 > 0, we 
have 

i9„,3«C33"aogg„)2. (2) 

In a desire to test the accuracy of these conjectures, for integers n > let us 
define 

C2(n) := , (3) 

' 2"logp„' ^ ^ 

:= ^Jr^- (4) 

3" (log qnY 

We calculated C2{n) and 03(71) from known pseudosquares and pseudocubes. We 
present these computations in Table [U for pseudosquares, and in Table [21 for 
pseudocubes, below. 

From Table [TJ we readily see that C2(n) appears to be bounded between 
roughly 5 and 162, with an average value near 45. There is no clear trend toward 
zero or infinity. Due to the common occurence of values of n where ip„,2 = 
ip„+i,2 (for example, n = 56), it should also be clear C2{n) does not have a 
limit. 

Similarly for the pseudocubes, in Table [5] we see that 0.05 < 03(11) < 6.5 for 
10 < n < 53, with an average value of roughly 1.22. And again, there is no clear 
trend toward zero or infinity, nor can there be a limit for C3(n). 

This leads us to the following refinements, if you will, of the conjectures 
([I]),© above. 

Conjecture. For the pseudosquares, we conjecture that 

liminf >0, (5) 

2"logp„ 

limsup- — — < 00. (6) 
Similarly, for the pseudocubes, we conjecture that 



liminf f'^"-^ > 0, (7) 
" - 3"(log(j„) 



limsup f^"'' <oo. (8) 

n—^oo o ViOg(/„J 

Our data also has implications on the relative efficiently of primality testing. 
In particular, several researchers have pointed out that if conjectures ([I]),© are 



Table 1. Values of C2(n) based on known pseudosquares. 



11 


p,, 




C2 


a) 


n 


Pi, 


L,.,..2 


C2(n) 


2 


3 


73 


16 


61 


39 


167 


112434732901969 


39.96 


3 


5 


241 


18 


72 


40 


173 


178936222537081 


31.58 


4 


7 


1009 


32 


41 


41 


179 


178936222537081 


15.69 


5 


11 


2641 


34 


42 


42 


181 


696161110209049 


30.45 


6 


13 


8089 


49 


28 


43 


191 


696161110209049 


15.07 


7 


17 


18001 


49 


64 


44 


193 


2854909648103881 


30.84 


8 


19 


53881 


71 


48 


45 


197 


6450045516630769 


34.70 


9 


23 


87481 


54 


49 


46 


199 


6450045516630769 


17.32 


10 


29 


117049 


33 


95 


47 


211 


11641399247947921 


15.46 


11 


31 


515761 


73 


34 


48 


223 


11641399247947921 


7.65 


12 


37 


1083289 


73 


24 


49 


227 


190621428905186449 


62.42 


13 


41 


3206641 


105 


41 


50 


229 


196640148121928601 


32.14 


14 


43 


3818929 


61 


97 


51 


233 


712624335095093521 


58.06 


15 


47 


9257329 


73 


38 


52 


239 


1773855791877850321 


71.92 


16 


53 


22000801 


84 


55 


53 


241 


2327687064124474441 


47.12 


17 


59 


48473881 


90 


70 


54 


251 


6384991873059836689 


64.15 


18 


61 


48473881 


44 


98 


55 


257 


8019204661305419761 


40.11 


19 


67 


175244281 


79 


49 


56 


263 


10198100582046287689 


25.40 


20 


71 


427733329 


95 


70 


57 


269 


10198100582046287689 


12.65 


21 


73 


427733329 


47 


54 


58 


271 


10198100582046287689 


6.32 


22 


79 


898716289 


49 


04 


59 


277 


69848288320900186969 


21.54 


23 


83 


2805544681 


75 


69 


60 


281 


208936365799044975961 


32.14 


24 


89 


2805544681 


37 


25 


61 


283 


533552663339828203681 


40.99 


25 


97 


2805544681 


18 


28 


62 


293 


936664079266714697089 


35.76 


26 


101 


10310263441 


33 


29 


63 


307 


936664079266714697089 


17.73 


27 


103 


23616331489 


37 


96 


64 


311 


2142202860370269916129 


20.23 


28 


107 


85157610409 


67 


89 


65 


313 


2142202860370269916129 


10.10 


29 


109 


85157610409 


33 


81 


66 


317 


2142202860370269916129 


5.04 


30 


113 


196265095009 


38 


67 


67 


331 


13649154491558298803281 


15.94 


31 


127 


196265095009 


18 


87 


68 


337 


34594858801670127778801 


20.14 


32 


131 


2871842842801 


137 


15 


69 


347 


99492945930479213334049 


28.81 


33 


137 


2871842842801 


67 


95 


70 


349 


99492945930479213334049 


14.39 


34 


139 


2871842842801 


33 


88 


71 


353 


295363187400900310880401 


21.32 


35 


149 


26250887023729 


152 


68 


72 


359 


295363187400900310880401 


10.63 


36 


151 


26250887023729 


76 


14 


73 


367 


3655334429477057460046489 


65.54 


37 


157 


112434732901969 


161 


79 


74 


373 


4235025223080597503519329 


37.86 


38 


163 


112434732901969 


80 


30 











Table 2. Values of C3(n) based on known pseudocubes. 



n 






C3(n) 


n 






C3(n) 


10 


79 


7235857 


6.42 


32 


337 


75017625272879381 


1.2 


11 


97 


8721539 


2.35 


33 


349 


75017625272879381 


0.394 


12 


103 


8721539 


0.764 


34 


367 


996438651365898469 


1.71 


13 


109 


91246121 


2.6 


35 


373 


2152984914389968651 


1.23 


14 


127 


91246121 


0.813 


36 


379 


12403284862819956587 


2.34 


15 


139 


98018803 


0.281 


37 


397 


37605274105479228611 


2.33 


16 


151 


1612383137 


1.49 


38 


409 


37605274105479228611 


0.77 


17 


157 


1612383137 


0.488 


39 


421 


37605274105479228611 


0.254 


18 


163 


7991083927 


0.795 


40 


433 


2058300390063371 14403 


0.459 


19 


181 


7991083927 


0.254 


41 


439 


1845193818928603436441 


1.37 


20 


193 


7991083927 


0.0827 


42 


457 


7854338425385225902393 


1.91 


21 


199 


20365764119 


0.0695 


43 


463 


12904554928068268848739 


1.04 


22 


211 


2515598768717 


2.8 


44 


487 


1338480954852 122 75 1 7303 


0.355 


23 


223 


6440555721601 


2.34 


45 


499 


60125695216741655189317 


0.527 


24 


229 


29135874901141 


3.49 


46 


523 


116614853914870278915947 


0.336 


25 


241 


29135874901141 


1.14 


47 


541 


116614853914870278915947 


0.111 


26 


271 


29135874901141 


0.365 


48 


547 


4139150561509947885227899 


1.31 


27 


277 


406540676672677 


1.69 


49 


571 


16248573199879956914339717 


1.69 


28 


283 


406540676672677 


0.558 


50 


577 


16248573199879956914339717 


0.56 


29 


307 


406540676672677 


0.181 


51 


601 


24191374719361484275890677 


0.274 


30 


313 


406540676672677 


0.0598 


52 


607 


24191374719361484275890677 


0.0912 


31 


331 


75017625272879381 


3.61 


53 


613 


674441580981249129037406633 


0.845 



true, then the running time of the pseudocube prime test, which depends on 

2/3 

the value of L^^ 3, should eventually outperform the pseudosquare prime test, 
whose running time depends on -/jp„,2- In particular, one infers from conjectures 
([U and © that 

» (V) > ' 

for sufficiently large n (see [T^l §9.1]). This inference follows from our refined 
conjectures as well. 

We have our first specific value of n to support namely with n = 48, where 
L^f^3 K. 2.214 • Lp„,2. However, given that C2(n) averages about 45, and 03(71) 
averages just over 1.2, we would reasonably expect ([9]) to largely be true only for 
n larger than about 75, under the assumption these averages are maintained. To 
test this, more pseudosquares and, in particular, more pseudocubes are needed. 



4 Algorithm Details 

We begin with a review of doubly-focused enumeration, explain how we employ 
parallelism, and how the space-saving wheel datastructure is utilized. We also 
discuss the details of our implementation, including the hardware platform and 
software used. 

4.1 Doubly- Focused Enumeration 

The main idea is that every integer x, with < x < H , can be written in the 
form 

X = tpMn - tnMp (10) 

where 

gcd(Mp,M„) = 1, 0<tp< H+^^^^P ^ and < t„ < M„. (11) 

(See [5] or [T^ Lemma 1].) This is an explicit version of the Chinese Remainder 
Theorem. 

To find pseudosquares, we set M„ and Mp to be products of small odd primes 
and 8, choose tp to be square modulo Mp, and —tn to be square modulo M„. To 
be precise, in our implementation we set 

Mp = 7 • 11 • 13 • 17 ■ 19 • 23 • 29 • 31 • 37 • 41 • 43 • 53 • 89 

^ 2057 04617 33829 17717 and 
M„ = 8 • 3 • 5 • 47 • 59 • 61 • 67 • 71 • 73 • 79 • 83 • 97 

= 4483 25952 77215 26840. 

Note that both Mp, M„ < 2^**, allowing us to work in 64-bit machine arithmetic. 



To find pseudocubes, the same idea applies, only note that if — f„ is a cube 
modulo Mn, so is We used only 2,9 and primes congruent to 1 (mod 3) for 
better filter rates: 

Mp = 2 • 7 • 13 • 31 • 43 ■ 73 • 79 • 127 • 139 • 157 • 181 

= 701 85635 61110 39402 and 
M„ = 9 ■ 19 • 37 ■ 61 • 67 ■ 97 • 103 • 109 • 151 • 163 
= 69311050 4329192503 

4.2 Parallelism and Main Loop 

Each processor core was assigned an interval of tp values to process by giving it 
values of H~ and . 

For finding pseudosquares, — H~ k, Mn • 4.76 x 10^^. For finding pseu- 
docubes, H+ -H- KMn- 4.99 X 10^2 

Parallelism was achieved by having different processors working on different 
intervals simultaneously. Once all processors had finished their current intervals, 
the work was saved to disk (allowing restarts as needed) and new intervals were 
assigned. 

To process an interval, each processor core did the following: 

1. Using the wheel datastructurc, generate all square or cube values of tp with 
H~ < tpMn < H~^, and store these in an array A[] . 

2. The wheel datastructure does not generate the tp values in order, so sort A [] 
in memory using quicksort. Note that H~ and arc chosen close enough 
together so that this array held no more than 40 million integers, using at 
most 320 megabytes of RAM per processor core. 

3. Using the first and last entries in A [] , compute a range of valid t„ values to 
process, and then use a wheel datastructure to generate all tn values in that 
range such that —tn is square modulo M„ for pseudosquares, or tn is a cube 
modulo Mn for pseudocubes. 

We use an outer loop over tn values in the order enumerated by the wheel 
data structure for M„, and an inner loop over consecutive tp values drawn 
from A [] . 

4. For each tn generated, we normalize sieve tables for the next 4 primes 
(101, 103, 107, 109 for pseudosquares, and 193, 199, 211, 223 for pseudocubes) 
to allow for constant-time table lookup to see if an x- value (see below) is a 
square/cube modulo these primes, indexed by tp value. 

The number of primes to use for this depends on how many tp values will 
be processed for each t„ - in our case, it was several hundred on average, so 
this step improves performance. If it were fewer, say 50, then normalizing the 
sieve tables would require more work than is saved by having constant-time 
lookup. 

5. For each tn generated, using binary search on A[] to find all the tp values 
it can match with, generate an a; = tpMn — tnMp within our global search 



range. (For example, in our last run for pseudosquares, we searched for x 
values between 7.5 x 10^"* and 10^^.) 

Note: at this point we do not actually compute the value of x. 

6. Lookup each tp value in the normalized tables mentioned above. If it fails 
any of the 4 sieve tests, move on to the next tp value. For pseudosquares, a 
tp values passes these tests with probability roughly (1/2)* = 1/16, and for 
pseudocubes, roughly (1/3)"* = 1/81. 

Note that this step is the running time bottleneck of the algorithm. 

7. The next batch of primes q have precomputed sieve tables that are not 
normalized, but we precompute Adp and M„ modulo each q so the we can 
compute X mod q without exceeding 64-bit arithmetic. Continue only if our 
tp value passes all these sieve tests as well. The expected number of primes 
q used in this step is constant. 

8. Finally, compute x using 128-bit hardware arithmetic, and see if it is a perfect 
square or perfect cube. If it passes this test, append x to the output file for 
this processor core. 

We had two wheel datastructures, one each for Mp and M„. For details on how 
this datastructure works, see [5]. We leave the details for how to modify the 
datastructure to handle cubes in place of squares to the reader. 

4.3 Implementation Details 

To compute the tables presented in ^ we used Butler University's cluster su- 
percomputer, BigDawg, which has 24 compute nodes, each of which has four 
AMD Opteron 8354 quad-core CPUs at 2.2GHz with 512KB cache, for a total 
of 384 compute cores. As might be expected, we did not have sole access to this 
machine for over a year, so the code was designed, and ran, using anywhere from 
10 to 24 nodes, or from 160 to 384 cores, depending on the needs of other users. 
This flexibility is one advantage of our parallelization method ~ by tp intervals. 
In [12], they parallelized over residue classes, which restricts the CPU count to 
a fixed number (180 in their case). 

BigDawg runs a Linux kernel on its head node and compute nodes, and the 
code was written in C-I--I- using the gnu compiler (version 4.1.2) with MPI. It 
has both 10GB ethernet and Infiniband interconnect, but inter-processor com- 
munication was not a bottleneck for our programs. 

We tested our code by first finding known pseudosquares (all but the highest 
few) and known pseudocubes, in the process verifying previous results. 

5 Future Work 

We plan to port our code to work with 8 NVidia CPUs recently added to Butler's 
supercomputer, giving it roughly 2-3 times the raw computing power. This will 
require a major restructuring of the code, and the removal of recursion in the 
wheel datastructure. 
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